Final Project - Indicators of Anxiety or Depression Based on Reported Frequency of Symptoms During Last 7 Days
Author
Ian Walsh & Logan Rosell
Published
November 12, 2025
Reseach Question: How did anxiety and depression levels differ between states and regions following the outbreak of COVID-19 in the United States?
Data Cleaning
Import libraries and dataset
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsimport plotly.express as pximport warningswarnings.simplefilter(action='ignore', category=pd.errors.SettingWithCopyWarning)df = pd.read_csv("./Datasets/Indicators_of_Anxiety_or_Depression_Based_on_Reported_Frequency_of_Symptoms_During_Last_7_Days.csv")df.head()
Indicator
Group
State
Subgroup
Phase
Time Period
Time Period Label
Time Period Start Date
Time Period End Date
Value
Low CI
High CI
Confidence Interval
Quartile Range
0
Symptoms of Depressive Disorder
National Estimate
United States
United States
1
1
Apr 23 - May 5, 2020
04/23/2020
05/05/2020
23.5
22.7
24.3
22.7 - 24.3
NaN
1
Symptoms of Depressive Disorder
By Age
United States
18 - 29 years
1
1
Apr 23 - May 5, 2020
04/23/2020
05/05/2020
32.7
30.2
35.2
30.2 - 35.2
NaN
2
Symptoms of Depressive Disorder
By Age
United States
30 - 39 years
1
1
Apr 23 - May 5, 2020
04/23/2020
05/05/2020
25.7
24.1
27.3
24.1 - 27.3
NaN
3
Symptoms of Depressive Disorder
By Age
United States
40 - 49 years
1
1
Apr 23 - May 5, 2020
04/23/2020
05/05/2020
24.8
23.3
26.2
23.3 - 26.2
NaN
4
Symptoms of Depressive Disorder
By Age
United States
50 - 59 years
1
1
Apr 23 - May 5, 2020
04/23/2020
05/05/2020
23.2
21.5
25.0
21.5 - 25.0
NaN
Rename the “Value” column to “Percent” to more accurately portray what the data is measuring
df.rename(columns={"Value": "Percent of Population"}, inplace =True)
Filter to only state data and drop unnecessary columns;Group and Subgroup are redundant, Time period and CI are just combinations of other column’s data.
state_data['Phase'].unique()# There are 2 values that contain dates which are already stored in other columns, so we can remove these datesstate_data['Phase'] = state_data['Phase'].str.split(' ', expand =True).get(0)
state_data['Indicator'] = pd.Categorical(state_data['Indicator'], categories = ['Symptoms of Depressive Disorder', 'Symptoms of Anxiety Disorder', 'Symptoms of Anxiety Disorder or Depressive Disorder'])state_data['Phase'] = pd.Categorical(state_data['Phase'], categories=['1', '2', '3', '3.1', '3.2', '3.3', '3.4', '3.5', '3.6', '3.7', '3.8', '3.9', '3.10'])state_data['Time Period Start Date'] = pd.to_datetime(state_data['Time Period Start Date']).dt.datestate_data['Time Period End Date'] = pd.to_datetime(state_data['Time Period End Date']).dt.date
EDA
Looking at some graphs
# Histogram of values for all statesplt1 = sns.histplot(state_data, x='Percent of Population', hue ='Indicator', alpha =0.5)plt.title('Histogram of Percent of Population by Indicator')plt.show()
national_avgs = state_data.groupby(['Time Period Start Date', 'Indicator'], observed=False).agg( nat_means = ('Percent of Population', 'mean'))nat_avg_plt = sns.lineplot(national_avgs, x='Time Period Start Date', y='nat_means', hue ='Indicator')plt.xticks(rotation=45)plt.title(f"Percent of Population Over Time")plt.ylabel('Percent of Population')plt.show()